Clustering of SNPs along a chromosome: can the neutral model be rejected?
نویسندگان
چکیده
Single nucleotide polymorphisms (SNPs) often appear in clusters along the length of a chromosome. This is due to variation in local coalescent times caused by, for example, selection or recombination. Here we investigate whether recombination alone (within a neutral model) can cause statistically significant SNP clustering. We measure the extent of SNP clustering as the ratio between the variance of SNPs found in bins of length l, and the mean number of SNPs in such bins, σ l /μl. For a uniform SNP distribution σ 2 l /μl = 1, for clustered SNPs σ l /μl > 1. Apart from the bin length, three length scales are important when accounting for SNP clustering: The mean distance between neighboring SNPs, ∆, the mean length of chromosome segments with constant time to the most recent common ancestor, lseg, and the total length of the chromosome, L. We show that SNP clustering is observed if ∆ < lseg ≪ L. Moreover, if l ≪ lseg ≪ L, clustering becomes independent of the rate of recombination. We apply our results to the analysis of SNP data sets from mice, and human chromosomes 6 and X. Of the three data sets investigated, the human X chromosome displays the most significant deviation from neutrality. INTRODUCTION Single nucleotide polymorphisms (SNPs) are the most abundant polymorphisms in most populations. Due to their ubiquity and stability they are useful in the diagnosis of human diseases (ZHOU et al., 2002), detection of human disease genes (WILLEY et al., 2002), and gene mapping in organisms as diverse as humans (MCINNES et al., 2001), Arabidopsis thaliana (CHO et al., 1999), and Drosophila (BERGER et al., 2001). For this reason, several large-scale SNPmapping projects are currently under way in eukaryotic model organisms including A. thaliana (http://arabidopsis.org/Cereon), Drosophila (HOSKINS et al., 2001), mouse (LINDBLAD-TOH et al., 2000), and human (INTERNATIONAL HUMAN GENOME SEQUENCING CONSORTIUM, 2001; THE INTERNATIONAL SNP MAP WORKING GROUP, 2001). A central question in the analysis of data collected in the context of these projects is how SNPs
منابع مشابه
Comparison of two QTL mapping approaches based on Bayesian inference using high-dense SNPs markers
To compare different QTL mapping methods, a population with genotypic and phenotypic data was simulated. In Bayesian approach, all information of markers can be used along with combination of distributions of SNP markers. It is assumed that most of the markers (95%) have minor effects and a few numbers of markers (5%) exert major effects. The simulated population included a basic population of ...
متن کاملDHPLC Applications: Finding DNA Variation on the Y Chromosome
Denaturing High-Performance Liquid Chromatography (DHPLC) is a recently developed technique forthe detection of single nucleotide polymorphisms (SNPs) and mutations. It involves the comparisonbetween two or more DNAs as a mixture of denatured and reannealed PCR products. The methodologyis based on the principle of reversed phase liquid chromatography and uses a unique DNA sepa...
متن کاملHaplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملA model for the clustered distribution of SNPs in the human genome
Motivated by a non-random but clustered distribution of SNPs, we introduce a phenomenological model to account for the clustering properties of SNPs in the human genome. The phenomenological model is based on a preferential mutation to the closer proximity of existing SNPs. With the Hapmap SNP data, we empirically demonstrate that the preferential model is better for illustrating the clustered ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002